7 research outputs found

    Glottal Source Information for Pathological Voice Detection

    No full text
    Automatic methods for the detection of pathological voice from healthy speech can be considered as potential clinical tools for medical treatment. This study investigates the effectiveness of glottal source information in the detection of pathological voice by comparing the classical pipeline approach to the end-to-end approach. The traditional pipeline approach consists of a feature extractor and a separate classifier. In the former, two sets of glottal features (computed using the quasi-closed phase glottal inverse filtering method) are used together with the widely used openSMILE features. Using both the glottal and openSMILE features extracted from voice utterances and the corresponding healthy/pathology labels, support vector machine (SVM) classifiers are trained. In building end-to-end systems, both raw speech signals and raw glottal flow waveforms are used to train two deep learning architectures: (1) a combination of convolutional neural network (CNN) and multilayer perceptron (MLP), and (2) a combination of CNN and long short-term memory (LSTM) network. Experiments were carried out using three publicly available databases, including dysarthric (the UA-Speech database and the TORGO database) and dysphonic voices (the UPM database). The performance analysis of the detection system based on the traditional pipeline approach showed best results when the glottal features were combined with the baseline openSMILE features. The results of the end-to-end approach indicated higher accuracies (about 2-3 % improvement in all three databases) when glottal flow was used as the raw time-domain input (87.93 % for UA-Speech, 81.12 % for TORGO and 76.66 % for UPM) compared to using raw speech waveform (85.12 % for UA-Speech, 78.83 % for TORGO and 73.71 % for UPM). The evaluation of both approaches demonstrate that automatic detection of pathological voice from healthy speech benefits from using glottal source information.Peer reviewe

    Automatic intelligibility assessment of dysarthric speech using glottal parameters

    No full text
    Objective intelligibility assessment of dysarthric speech can assist clinicians in diagnosis of speech disorders as well as in medical treatment. This study investigates the use of glottal parameters (i.e. parameters that describe the acoustical excitation of voiced speech, the glottal flow) in the automatic intelligibility assessment of dysarthric speech. Instead of directly predicting the intelligibility of dysarthric speech using a single-stage system, the proposed method utilizes a two-stage framework. In the first stage, two-class severity classification of dysarthria is performed using support vector machines (SVMs). In the second stage, intelligibility estimation of dysarthric speech is computed using a linear regression model. Two sets of glottal parameters are explored: (1) time-domain and frequency-domain parameters and (2) parameters based on principal component analysis (PCA).Acoustic parameters proposed in a similar intelligibility prediction study by Falk et al. [1] are used as baseline features. Evaluation results show that the two-stage framework leads to improvement in the intelligibility assessment measures (correlation and root mean square error) compared to the single-stage framework. The combination of the glottal parameters sets results in better performance in the severity classification and intelligibility estimation tasks compared to the baseline features.Peer reviewe

    Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features

    No full text
    In clinical practice, assessment of intelligibility in speakers with dysarthria is performed by speech-language pathologists through auditory perceptual tests which demand patients’ presence at hospital and involve time-consuming examinations. Frequent clinical monitoring can be costly and logistically inconvenient both for patients and medical experts. Here, we aim to automate the procedure of assessment of intelligibility in dysarthric speakers with an objective, speech-based method that can be employed in a telescreening application. The proposed method predicts the level of intelligibility in dysarthric speakers using four levels of speech intelligibility (very low, low, mediocre and high). The study compares several automatic methods to assess the intelligibility level in speakers with dysarthria by utilizing information generated at the level of the vocal folds through glottal features and by using coded telephone speech (i.e. speech that is used in telescreening applications). In addition to the glottal features, the openS-MILE features are used as acoustic baseline features. Using features obtained from coded speech utterances and the corresponding intelligibility level labels, multiclass-support vector machine (SVM) classifiers are trained. A separate set of multiclass-SVMs are trained using both individual glottal and acoustic features as well as their combinations. Coded telephone speech is generated with the adaptive multi-rate codec with two operational bandwidths (narrowband and wideband), from utterances of an open database of dysarthric speech (Universal Access-Speech). Experimental results showed good classification accuracies for the glottal features, indicating their effectiveness in the intelligibility level assessment in speakers with dysarthria even in the challenging coded condi-tion. Improvement in classification accuracy was obtained when the glottal features were combined with the openSMILE acoustic features, which validate the complimentary nature of the glottal features.Peer reviewe

    Dysarthric speech classification from coded telephone speech using glottal features

    No full text
    This paper proposes a new dysarthric speech classification method from coded telephone speech using glottal features. The proposed method utilizes glottal features, which are efficiently estimated from coded telephone speech using a recently proposed deep neural net-based glottal inverse filtering method. Two sets of glottal features were considered: (1) time- and frequency-domain parameters and (2) parameters based on principal component analysis (PCA). In addition, acoustic features are extracted from coded telephone speech using the openSMILE toolkit. The proposed method utilizes both acoustic and glottal features extracted from coded speech utterances and their corresponding dysarthric/healthy labels to train support vector machine classifiers. Separate classifiers are trained using both individual, and the combination of glottal and acoustic features. The coded telephone speech used in the experiments is generated using the adaptive multi-rate codec, which operates in two transmission bandwidths: narrowband (300 Hz - 3.4 kHz) and wideband (50 Hz - 7 kHz). The experiments were conducted using dysarthric and healthy speech utterances of the TORGO and universal access speech (UA-Speech) databases. Classification accuracy results indicated the effectiveness of glottal features in the identification of dysarthria from coded telephone speech. The results also showed that the glottal features in combination with the openSMILE-based acoustic features resulted in improved classification accuracies, which validate the complementary nature of glottal features. The proposed dysarthric speech classification method can potentially be employed in telemonitoring application for identifying the presence of dysarthria from coded telephone speech.Peer reviewe

    Dysarthric speech classification using glottal features computed from non-words, words and sentences

    No full text
    Dysarthria is a neuro-motor disorder resulting from the disruption of normal activity in speech production leading to slow, slurred and imprecise (low intelligible) speech. Automatic classification of dysarthria from speech can be used as a potential clinical tool in medical treatment. This paper examines the effectiveness of glottal source parameters in dysarthric speech classification from three categories of speech signals, namely non-words, words and sentences. In addition to the glottal parameters, two sets of acoustic parameters extracted by the openSMILE toolkit are used as baseline features. A dysarthric speech classification system is proposed by training support vector machines (SVMs) using features extracted from speech utterances and their labels indicating dysarthria/healthy. Classification accuracy results indicate that the glottal parameters contain discriminating information required for the identification of dysarthria. Additionally, the complementary nature of the glottal parameters is demonstrated when these parameters, in combination with the openSMILE-based acoustic features, result in improved classification accuracy. Analysis of classification accuracies of the glottal and openSMILE features for non-words, words and sentences is carried out. Results indicate that in terms of classification accuracy the word level is best suited in identifying the presence of dysarthria.Peer reviewe

    Glottal source estimation from coded telephone speech using a deep neural network

    No full text
    In speech analysis, the information about the glottal source is obtained from speech by using glottal inverse filtering (GIF). The accuracy of state-of-the-art GIF methods is sufficiently high when the input speech signal is of high-quality (i.e., with little noise or reverberation). However, in realistic conditions, particularly when GIF is computed from coded telephone speech, the accuracy of GIF methods deteriorates severely. To robustly estimate the glottal source under coded condition, a deep neural network (DNN)-based method is proposed. The proposed method utilizes a DNN to map the speech features extracted from the coded speech to the glottal flow waveform estimated from the corresponding clean speech. To generate the coded telephone speech, adaptive multi-rate (AMR) codec is utilized which is a widely used speech compression method. The proposed glottal source estimation method is compared with two existing GIF methods, closed phase covariance analysis (CP) and iterative adaptive inverse filtering (IAIF). The results indicate that the proposed DNN-based method is capable of estimating glottal flow waveforms from coded telephone speech with a considerably better accuracy in comparison to CP and IAIF.Peer reviewe

    Duration of the rhotic approximant /ô/ in spastic dysarthria of different severity levels

    No full text
    Dysarthria is a motor speech disorder leading to imprecise articulation of speech. Acoustic analysis capable of detecting and assessing articulation errors is useful in dysarthria diagnosis and therapy. Since speakers with dysarthria experience difficulty in producing rhotics due to complex articulatory gestures of these sounds, the hypothesis of the present study is that duration of the rhotic approximant /ô/ distinguishes dysarthric speech of different severity levels. Duration measurements were conducted using the third formant (F3) trajectories estimated from quasi-closed-phase (QCP) spectrograms. Results indicate that the severity level of spastic dysarthria has a significant effect on duration of /ô/. In addition, the phonetic context has a significant effect on duration of /ô/, the I-r-E context showing the largest difference in /ô/ duration between dysarthric speech of the highest severity levels and healthy speech. The results of this preliminary study can be used in the future to develop signal processing and machine learning methods to automatically predict the severity level of spastic dysarthria from speech signals.Peer reviewe
    corecore